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Abstract — Acting on time-critical events by processing ever 
growing social media or news streams is a major technical 
challenge. Many of these data sources can be modeled as multi- 
relational graphs. Continuous queries or techniques to search for 
rare events that typically arise in monitoring applications have 
been studied extensively for relational databases. This work is 
dedicated to answer the question that emerges naturally: how 
can we efficiently execute a continuous query on a dynamic graph? 
This paper presents an exact subgraph search algorithm that 
exploits the temporal characteristics of representative queries for 
online news or social media monitoring. The algorithm is based 
on a novel data structure called the Subgraph Join Tree (SJ- 
Tree) that leverages the structural and semantic characteristics 
of the underlying multi-relational graph. The paper concludes 
with extensive experimentation on several real-world datasets 
that demonstrates the validity of this approach. 

I. Introduction 

Social networks, social media websites and mainstream 
news media are driving an exponential growth in online 
content. This information barrage presents both a formidable 
challenge and an opportunity to applications that thrive on 
situational awareness. Examples of such applications include 
emergency response, cyber security, intelligence and finance 
(jj, ^2| where the data stream is monitored continuously for 
specific events. Timeliness of the detection carries paramount 
importance for such applications. The applications derive their 
competitive edge from fast detection as late detection may not 
have much value due to incurred damage to resources. Our 
work is motivated by queries that look for rare events, have 
a time constraint on the time to discovery and never need a 
bulk retrieval of historic data due to their monitoring nature. 

Continuous queries evolved in the field of relational 
databases to address applications with precisely the above 
characteristics. A continuous query system is defined as one 
where a query logically runs continuously over time as op- 
posed to being executed intermittently |3j, j4j. Thus, con- 
tinuous query processing is data-driven or trigger oriented. 
Many of the prominent news or social media streams can be 
represented as multi-relational data sources. Multi-relational 
graphs are often an attractive representation for data sources 
with sparsity. The problem of monitoring events in such data 
streams can be viewed as continuously searching the dynamic 




Fig. 1. A graph query for monitoring emergencies in social media and news 
streams. 



graph for patterns that represent events of interest. Following 
is a use case where such continuous monitoring is required. 



Fig. [T] shows a graph pattern that represents such an event. 
An operator may substitute the "keyword" with "fire" or 
"accident" and register several queries. Articles refer to articles 
in news as well as social media posts. This query will capture 
events that are reported from the same location. Observe that 
we specify the label for only one vertex in this query, the rest 
of the vertices have only type specified. Therefore, we are 
using the labeled vertex to anchor into a context and report 
when multiple events with that context are detected in the data 
stream. 

Graph search involves finding exact or approximate matches 
for a query subgraph in a larger graph. It has been studied 
extensively and is formally defined as the problem of subgraph 
isomorphism: given a pattern or query graph (henceforth 
described as query graph) Gq and a larger input graph (hence- 
forth described as the data graph) Gd, find all isomorphisms 
of Gq in Gd' Following the definition of isomorphism, the 
matching involves finding a one-to-one correspondence be- 
tween the vertices of a subgraph of Gd and vertices of Gq 
such that all vertex adjacencies are preserved. Now consider 
the challenges in applying traditional graph search techniques 



An emergency control center in Seattle continuously 
receives updates from news and social media about 
accidents or other emergencies in Washington state, 
USA. As the messages stream in, an operator needs to 
detect when an accident happens as soon as possible, 
evaluate the emergency and mobilize the emergency 
responders within 10 minutes. 



to this problem. Unless carefully adapted, a standard search 
function will search the entire data graph repeatedly and 
retrieve the same search results. Also, many of the best 
performing graph search algorithms rely on indexing the 
graph. Even with an interval as large as 5 minutes, rebuilding 
the index of a massive graph repeatedly is infeasible. This 
motivates exploration of incremental algorithms for continuous 
queries, although the general problem of incremental subgraph 
isomorphism is proven to be NP-complete as well |5|. 

Queries like the one shown in Fig. [T] share a number of 
common attributes. First, they all involve an implicit time win- 
dow to suggest the timeliness aspect associated with the query. 
Clearly, the length of the time window varies depending on 
the application context. The average monitoring time window 
for a high volume social media stream may be in tens of 
minutes whereas the equivalent period for online news may 
be in hours or days. Second, all these queries aim to discover 
a number of temporal events that share the same context, such 
as a common set of keywords and location. Lastly, a multi- 
relational graph often takes the form of k-partite graphs f6l, 
(Tjl where each partite set represents a group of entities of the 
same type. For queries as ones described in Fig. [T] each event 
that is represented by an article or a tweet can be viewed as 
a k-partitite subgraph. 

We exploit these three features to implement a continuous 
query processing framework for multi-relational graphs. First, 
by utilizing a rolling time window we continuously prune 
partial search results that would otherwise need to be tracked 
and would contribute to the combinatorial growth in memory 
utilization. Second, the temporal property of the vertices and 
edges representing events suggests that it is logical to search 
for distinct subgraphs where such "temporal" vertices or edges 
are ordered, thus significantly reducing the search space. 
Finally, we take advantage of the multi-relational structure 
of the data and the characteristics of temporal events to 
avoid expensive joins. Given a multi-relational query graph 
we decompose it in a hierarchical fashion. We design a data 
structure called the Subgraph Join Tree, or henceforth referred 
as the S J -Tree to model the hierarchical decomposition and 
store matches with various subgraphs of the query graph as 
represented in the tree. We refer to the smallest units of 
the decomposed query graph as "search primitives", which 
almost always consists of more than one edge. As new edges 
arrive over time, we continuously perform (a) "local searches" 
to look for matches with the search primitives and (b) use 
the decomposition structure to "join" them into progressively 
larger matches. This represents a middle ground between 
the periodic application of a graph search algorithm on the 
data graph and the approach that would have been employed 
by a traditional stream database. Stream databases have no 
alternative but to model each edge in the query graph as a 
separate join operator. Our model can express this degenerate 
case where an edge is represented as a search primitive in the 
SJ-Tree, but the performance is extremely poor. By grouping 
subgraphs into search primitives, we can simplify the query 
plan, significantly improve performance by multiple orders 



of magnitude, and perhaps most importantly, reason about 
the trade-offs involved and explore a large space of possible 
optimizations. 

A. Contributions 

Our contributions from this research are summarized below. 

1. We introduce a data structure called SJ-Tree for query 
graph decomposition (section [IV|) and present a novel subgraph 
search algorithm ( [Vl| ) for continuous queries on dynamic 
multi-relational graphs. 

2. We present query optimizations that significantly improve 
the query processing performance by accounting for the tem- 



poral nature of the data graph (section [VTl ). 

3. We present a query-decomposition algorithm (section \V\ 
that given the query graph and information about the data 
graph, exploits both structural and semantic characteristics of 
these graphs and produces a SJ-Tree for the query graph. 

4. We compare our performance with the incremental sub- 
graph isomorphism algorithm developed by Fan et al. |5 1 and 
show that our approach provides improvements by multiple 
orders of magnitude (section [VIII| ). 

5. We present a series of experiments on representative on- 
line news (New York Times), co-authorship networks (DBLP) 
and social media data sources (Tencent Weibo) modeled as 
multi-relational graphs. The scale of these datasets are orders 
of magnitude larger than previously reported research i5|, ||8| 
in the literature (section [VIII| ). 

6. We present a theoretical model for complexity analysis of 
both query decomposition and the search algorithm. We also 
provide an extensive experimental analysis of the algorithm's 
performance as a function of the frequency distribution of 
vertex labels for verification of the theoretical model. 




Fig. 2. More examples of monitoring queries on multi-relational graphs. 
The query at the top can be used to discover events in a certain context. Set 
one of the keywords to "Oil" and run the query to discover various events 
that center around oil, such as price movements, discovery, accident etc. By 
setting the keyword to "buyout", the bottom query can be used to detect when 
news surface about a merger between two companies. 



B. Problem Statement 

Every edge in a dynamic graph has a timestamp associated 
with it and therefore, for any subgraph ^ of a dynamic graph 
we can define a time interval r{g) which is equal to the interval 
between the earliest and latest edge belonging to g. Given a 



dynamic multi-relational graph G^, a query graph Gq and a 
time window tw, we report whenever a subgraph gd that is 
isomorphic to Gq appears in Gd such that r^gd) < tw The 
isomorphic subgraphs are also referred to as matches in the 
subsequent discussions. If M{G^) is the cumulative set of all 
matches discovered until time step k and Ek-\-i is the set of 
edges that arrive at time step /c + 1, we present an algorithm 
to compute a function f {Gd-,Gq^ Ek+i) which returns the 
incremental set of matches that result from updating Gd with 
Ek+i and is equal to M(G^+^) - M(G^). We assume that 
the graph only receives edge inserts and no deletions. 

II. Background 

A. Multi-Relational Graphs 

Single relational graphs have been widely used to model 
systems comprised of homogeneous elements related by a 
single type of relation. A social network where vertices repre- 
sent people and edges represent connections between people 
is an example of a single-relational graph. A multi-relational 
graph becomes a useful construct for modeling heterogeneous 
relations between a possibly heterogeneous set of entities. 

Definition 2.1.1 Multi-Relational Graph A multi- 
relational graph denoted as G = (F, is a graph represen- 
tation of a multi-relational database. If the database contains 
K entity types as Ei^...Ek, then the vertex set V{G) is 
partitioned into K sets Vi, Vr- For any vertex v e Vk, the 
label for the vertex is represented from the domain of the entity 
type Ek. The edges of the graph are the relations between 
various entities as indicated in an entity-relation model. Thus, 
an edge in the graph e e E{G)^ e = {vi^Vj) is an instance of 
a relation Rij between entities Ei and Ej . 

A graph representation of such a multi-relational database 
takes the form of a K-partite graph [TJ, if there are no relations 
between homogeneous entities or equivalently, if there are no 
edges between vertices that belong to the same partite set. In 
practice, such relationships are not rare. Examples of such 
linkages are citation links between articles and social ties 
between two members in a network. However, we omit unary 
relationships from our multi-relational model. Our omission 
of unary relationships is driven by usability and a desire for 
simplicity. Fig. 1 and 2 show a number of examples embodying 
a range of events. Consider the example in Fig. 2 that detects 
a series of articles that refer to the same set of keywords; 
one may wish to introduce unary relationships in the graph 
to indicate citation between articles and thus, focus only on 
articles with high citation counts. However, such queries can 
be alternatively represented by adding a query constraint to the 
vertices that require them to have a minimum degree. Or, such 
relations could be represented using an intermediate vertex of 
a different type. Thus, for the scope of this work we define 
patterns of interest as query graphs that are subgraphs of the 
K-partite multi-relational graph. 

B. Continuous Queries 

A continuous query can be described as computing a 
function / over a stream S continuously over time and 



notifying the user whenever the output of / satisfies a user- 
defined constraint |3|. They are distinguished from ad-hoc 
query processing by their high selectivity (looking for unique 
events) and need to detect newer updates of interest as opposed 
to retrieving lots of past information. In this paradigm the 
primary objective is to notify a listener as soon as the query 
is matched. One may view conventional databases as passive 
repositories with large collections of data that work in a 
request-response model whereas continuous queries are data- 
driven or trigger oriented. These features coupled with real- 
time demands challenge many of the fundamental assumptions 
for conventional databases and establish continuous query 
processing on relational data streams as a major research 
area. The literature on database research from the past two 
decades is abundant with work on continuous query systems 
||9j, pO| . Babcock et al. |11| provide an excellent overview 
of continuous query systems and their design challenges. 

C. Graph Queries 

Graph querying techniques have been studied extensively in 
the field of pattern recognition over nearly four decades 1 12^|. 
Our work is focused on subgraph isomorphism which is as 
defined as follows. 

Definition 2.2.1 Subgraph Isomorphism Given the 
query graph Gq and a matching subgraph of the data graph 
(Gd) denoted as G^, a matching between Gq and G^ involves 
finding a bijective function / : V{Gq) ^(^d) such that 
for any two vertices ui,U2 G V(Gg), {ui,U2) G E{Gq) ==> 
{f{ui),f{u2))eE{G'^). 

Two popular subgraph isomorphism algorithms were de- 
veloped by UUman (H} and Cordelia et al. (141 . 
algorithm fT4l employs a filtering and verification strategy 
and outperforms the original algorithm by Ullman. However, 
both these approaches perform the search without using any 
pre-processed information. Over the past decade, the database 
community has focused strongly on developing indexing and 
query optimization techniques to speed up the searching 
process. A common theme of such approaches is to index 
vertices based on k-hop neighborhood signatures derived from 
labels and other properties such as degrees, spectral properties 
and centrality |[T5|-|[T9). Other major areas of work involve 
join-order optimization fTSj , pQ| and application of search 
techniques for alternative representations such as similarity 
search in a multi-dimensional vector space |21|. Apart from 
neighborhood based signatures, graph sketches is an important 
area that focuses on generating different synopses of a graph 
data set | [22| , |[23l. Development of efficient graph sketching 
algorithms and their applications into query estimation is 
expected to gain prominence in near future. 

III. Related Work 

Investigation of subgraph isomorphism for dynamic graphs 
did not receive much attention until recently. It introduces new 
algorithmic challenges because we can-not afford to index a 
dynamic graph frequently enough for applications with real- 
time constraints. In fact this is a problem with searches on 



large static graphs as well p4| . There are two alternatives in 
that direction. We can search for a pattern repeatedly or we 
can adopt an incremental approach. The work by Fan et al. (5] 
presents incremental algorithms for graph pattern matching. 
However, their solution to subgraph isomorphism is based on 
the repeated search strategy. Chen et al. 1 8 1 proposed a feature 
structure called the node-neighbor tree to search multiple 
graph streams using a vector space approach. They relax the 
exact match requirement and require significant pre-processing 
on the graph stream. Our work is distinguished by its focus on 
temporal queries and handling of partial matches as they are 
tracked over time using a novel data structure. From a data- 
organization perspective, the SJ-Tree approach has similarities 
with the Closure-Tree |25|. However, the closure-tree approach 
assumes a database of independent graphs and the underlying 
data is not dynamic. There are strong parallels between our 
algorithm and the very recent work by Sun et al. |24|, where 
they implement a query-decomposition based algorithm for 
searching a large static graph in a distributed environment. 
Our work is distinguished by the focus on continuous queries 
that involves maintenance of partial matches as driven by the 
query decomposition structure, and optimizations for real-time 
query processing. 

IV. Incremental Query Processing 
A. Naive Approach 

We begin with a simplistic solution to motivate an incre- 
mental approach for continuous query processing, lets call 
it PROCESS-BATCH-NAIVE. For every new edge that is 
added to G^, we detect if the edge matches any edge in 
the query graph. This check can be performed minimally by 
examining 1) if there are edges in the query graph with the 
same type as the new edge and 2) if the endpoint vertices 
of the new edge match with the corresponding edges in the 
query graph based on their types and attributes. Once an edge 
is considered as a matching candidate, the next step is to 
consider different combinations of matches it can participate 
in. Let's assume that the query graph contains the following 
edges. El = {A, B} and = {B, C} where A, B and 
C are vertices in the query graph. Further assume that an 
edge from the search graph Ef = {P, Q} matches with Ef. 
We represent this match as Mq = {^^^'^il- Next, two more 
search graph edges E2 = {Q^R} and £^3 = {Q^S} arrive 
and they match with E2. Both find that they have a common 
vertex with Mq, and add the respective matching edge data to 
create two more matches. Mi = {{E^, Ef}, {E^, E^}} and 
M2 = {{El,El},{E^,EI}}. A simple illustration of this 
matching process is shown in Fig. [3] We call this operation 
to extend an existing match structure by adding a new edge 
mapping information AUGMENT-MATCH. 

We describe this process in Algorithm [T] Let Mg and Mt are 
set of matches that contain Vs and Vf. Once a partial match that 
contained one of the endpoint vertices (Vs) of the new edge is 
augmented, we also need to see if it can be extended further by 
exploring the neighborhood of the other vertex (vt) connected 
to this edge. This can be accomplished by looking up the set 
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(b) Evolution of search graph and matches 



Fig. 3. Illustration of PROCESS -BATCH-NAIVE algorithm 



of partial matches (M^^) that contain Vt and trying to extend 
the newly augmented match in conjunction with each member 
of M^^. The operation of combining two partial matches and 
producing a larger match is indicated by EDGE-BASED-JOIN. 
This process is symmetrically repeated for Vf. 



Algorithm 1 PROCESS-BATCH-NAIVE(Gd, G^, M, e^^^/es) 



While intuitively simple, Algorithm [T] falls prey to combina- 
torial explosion very quickly. It finds the match with the query 
graph at the cost of creating many partial matches. Assume 
that the Gd receives a large number of edges that match with 
Ef. This would add a lot of partial matches that contains 
mapping information for Ef. Subsequently, every future edge 
that matches with £^1 will need to be matched or checked 
against all the existing partial matches for an AUGMENT- 
MATCH operation. While the subgraph matching problem 
has an inherent exponential nature associated with it, a better 
algorithm will restrict the growth of the number of partial 
matches to track and still produce the correct result. 



1: for all e G edges do 

2: Vs = source[e] 

3: Vt = target[e] 

4: Ms = {me M\vs e V{m)} 

5: Mt = {m e M\vt e V{m)} 

6: = Ms-Mt 

7: = Mt-Ms 

8: Mst = MsHMt 

9: for i = 1 ^ size[M^] do 

10: list =AUGMENT-MATCH(Grf, G^, M^\i],e, Vs) 

11: M = MU EDGE-BASED-JOIN(/zst, M^^, e) 

12: for z = 1 ^ sizelMl"] do 

13: list =AUGMENT-MATCH(Gd, G^, M^^[z], e, Vt) 

14: M = MU EDGE-BASED-JOIN(/zst, M^, e) 

15: for i = I ^ size[Mst] do 

16: M = MUAUGMENT-MATCH(Gd, G^, Mst \i] , e) 

17: matches =FIND-MATCHES(Gd, G^, v^) 

18: if size[matches] 7^ then 

19: M MU {matches} 



B. Our Approach 

We consider exploiting the characteristics of the data and 
query graphs to reduce the amount of intermediate information 
for tracking partial matches. One possible solution is to create 
or augment a partial match with an incoming edge when 
the new information has a higher probability of leading to 
a complete match. This could involve searching for motif 
subgraphs that have a higher probability of leading to a 
complete match. 

Our objective is to introduce an approach that guides the 
search process to look for specific subgraphs of the query 
graph and follow specific transitions from small to larger 
matches. Following are the main intuitions that drive this 
approach, 

1) Instead of looking for a match with the entire graph or 
just any edge of the query graph, partition the query 
graph into smaller subgraphs and search for them. 

2) Track the matches with individual subgraphs and com- 
bine them to produce progressively larger matches. 

3) Define a join order in which the individual matching 
subgraphs will be combined. Do not look for every 
possible way to combine the matching subgraphs. 

Although the current work is completely focused on tem- 
poral queries, the graph decomposition approach is suited for 
a broader class of applications and queries. The key aspect 
here is to search for substructures without incurring too much 
cost. Even if some subgraphs of the query graph are matched 
in the data, we will not attempt to assemble the matches 
together without following the join order. Thus, if there are 
substructures that are too frequent, joining them and producing 
larger partial matches will be too expensive without a stronger 
guarantee of finding a complete match. On the other hand, if 
there is a substructure in the query that is rare or indicates 
high selectivity, we should start assembling the partial matches 
together only after that substructure is matched. Thus, for 
query graphs that have different substructures with varying 
frequency or selectivity, the problem of assembling partial 
matches is equivalent to a join order optimization problem 
(26). 

C. Subgraph Join Tree (S J -Tree) 

We introduce a tree structure called Subgraph Join Tree 
(S J -Tree) that supports the above intuitions for implementing 
a join order based on selectivity of substructures of the query 
graph. 

Definition 4.1.1 A SJ-Tree T is defined as a binary tree 
comprised of the node set Nt^ Each n G Nt corresponds to 
a subgraph of the query graph Gq. Let's assume Vsg is the 
set of corresponding subgraphs and \ Vsg\ = \Nt\- Additional 
properties of the SJ-Tree are defined below. 

1) The subgraph corresponding to the root of the SJ-Tree is 
isomorphic to the query graph. Thus, for = root{T}, 
Vscirir} Gq. 

2) The subgraph corresponding to any internal node of T is 
isomorphic to the output of the join operation between 



the subgraphs corresponding to its children. Thus, for a 
given node n, if ni = left{n} and rir = right{n} are 
the left and right child of n, then Vsg{'^} = KsgI^z} ^ 
VsG{i^r}- Given two graphs Gi = (Vi^Ei) and G2 = 
(V2,£^2), the join operation is defined as G3 = Gi ix 
G2, such that G3 = {Vs, Es) where I/3 = Vi U V2 and 
= El U E2. 

3) Each node in the SJ-Tree maintains a set of matching 
subgraphs. We define a function matches{n) that for 
any node n G Nt, returns a set of subgraphs of the 
search graph. If M = matches{n), then VG^n ^ 
Gm VsG{n}. 

4) Each internal node n in the SJ-Tree maintains a sub- 
graph, CUT-SUBGRAPH(n) that equals the intersection 
of the query subgraphs of its child nodes. 

For each n e Nt such that NUM-CHILDREN(n) > 0, 
CUT-SUBGRAPH(n) = VsG{ni} H VsG{nr}, 
where ni = left{n} and rir = right{n}. When 
NUM-CHILDREN(v) = 0, CUT-SUB GRAPH (n) = 0. 

5) For any internal node n G Nt such that CUT- 
SUB GRAPH (n) ^ 0, we also define a projection 
operator H as follows. 

Assume that Gi and G2 are isomorphic, Gi = G2. 
Also define and as functions that define the 
bijective mapping between the vertices and edges of 
Gi and G2. Consider gi, a subgraph of Gi'. gi C Gi. 
Then ^2 = n(G2,^i) is a subgraph of G2 such that 
V{g2) = ^v{V{gi)) and E{g2) = ^E{E{gi)). 

6) Each node n G Nt stores matches {n) via a table of 
key- value pairs where the key is the projection of the 
query graph and the value is the matching subgraph in 
the data graph. Observe that it is possible to have many 
matching subgraphs with the same key, and the table 
maintains one-to-many mappings for each projected data 
subgraph. 

Conceptually, this is an index structure where keys track 
the occurrence of matching subgraphs in the data graph. Our 
decision to use a binary tree as opposed to an n-ary tree is 
influenced by the simplicity and lowering the combinatorial 
cost of joining matches from multiple children. Selection of a 
left-deep vs. bushy trees is a well- studied problem in literature 
fTl]. Analyzing the trade-offs between different tree models 
is part of our ongoing work. 

V. Query Graph Partitioning 

With the SJ-Tree data structure in mind, the next task is 
to automatically decompose a query graph Gq to create a 
SJ-Tree. Broadly our aim is to decompose the query graph 
into a number of smaller graphs, which we refer to as search 
primitives, and perform local searches for these primitives. 
We use the term local search to refer to a subgraph search 
performed in the neighborhood of an edge in the data graph for 
a small query subgraph. The primitives are restricted to small 
and selective query subgraphs to keep the local search efficient. 
Algorithm |2] outlines the process to partition the query into 
several smaller graph primitives, and then create the SJ-Tree 




Algorithm 2 SJ-Tree Creation 
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Fig. 4. Illustration of query decomposition as it happens in SJ-Tree. The 
root represents the original query graph whereas its children and other nodes 
represent progressively smaller query subgraphs. 



based on our selectivity criterion in a bottom-up fashion. An 
important goal of the decomposition process is to push the 
most selective subgraph to the lowest level of the SJ-Tree to 
reduce the number of partial matches. 

In order to compute these search primitives, we extract 
the neighborhood of each vertex based on its selectivity. 
We compute the selectivity of a vertex based on a score 
function that is similar to the term frequency-inverse document 
frequency (TF-IDF) weighting in information retrieval. The 
score function shown in Procedure SCORE (line [18]) rewards 
a vertex for having high degree and a neighborhood which 
starts to form earliest in time- stamp order in the query graph. 
It however penalizes a vertex for having a high degree label or 
type in the data graph. The intuition behind the score function 
is that a high degree vertex in the query graph will have a 
lower number of matches in the data graph than the lower 
degree vertices of the query graph. At the same time, for real 
time event detection applications, we want to be able to detect 
an event exactly as it happens and know how close we are to 
seeing the entire event manifest. This makes it necessary to 
factor in the time-stamp associated with the neighborhood, as 
edges and vertices that appear earlier need to be matched as 
early as possible. Finally a high degree vertex label (labels 
uniquely identify vertices) in the data graph will make the 
local search costly as a large number of its incident edges have 
to be checked for matches, hence this is inversely proportional 
to the selectivity. If only the vertex type is known we use the 
average degree of the vertex type of the vertex. 

Once we have a ordered list of vertices, we extract the 
neighborhood of the vertex with highest selectivity and create 
a node containing this subgraph (line 4). We then remove this 
vertex, the edges selected from the query graph as well as 
any nodes in the query graph which might have become 0- 
degree by these removals in the function TRUNCATE-GRAPH 
(lines 5 and 10). If the SJ-Tree is empty then we denote this 
new node as the root. Otherwise we create a new root using 
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procedure CREATE-SJ-TREE(Gg, Gd, k, root) 
T = INIT-SJ-TREE(0) 

V = {v ^ V{Gq)\argmax{Score{v^Gq^Gd))} 
root = CREATE-NODE(T, NEIGHBORS(v, GJ) 
Gj = TRUNCATE-GRAPH(Gg, E{VsG{root})) 
i = 1 

while E(Gq^^^) 7^ do 

V = {v e V{Gq)\argmax{Score{v^Gq^Gd)) 

and NEIGHBORS(i;, G^) H Vsciroot} ^ 0} 
right = CREATE-NODE(T, K-NBRS(v, k, G\)) 
G^+i = TR\]^CKYE-GRAVY{{G\,E{VsG{right})) 
left = root 

root =create-node(GRAPH-UNION(/e/t, right)) 
root.cut-subgraph = GRAPH-INTERSECTION (k/t, right) 
BUILD-SUBTREE(root, left, right) 
i = i + \ 
return root 
end procedure 

procedure SCORE(v, G^, G^) 
TV = NEIGHBORS(v, Gq) 

score = {deg{v) x (max-time(Gg)/min-time(A^))) 
if label(v) is known from Gq then 

score = score/ deg (label {v)) 

else 

score = score/ deg {type{v)) 
return score 
end procedure 



the union of the subgraphs in the current root and the newly 
created node. We then make the current root the left child of 
the new root and the node containing the primitive the right 
child (lines 11-14). Next, the scores are recomputed for the 
remaining vertices in the now smaller query graph and this 
procedure is repeated iteratively. This algorithm produces a 
SJ-Tree as shown in Fig. [4j if we make the assumption here 
that the vertices containing a particular keyword and location 
label have very high degree, and the "article" vertex type has 
much lower average degree. Article 1 denotes the article with 
the lowest edge timestamps, hence it's two edge neighborhood 
forms the left node at the lowest level of the SJ-Tree. We skip 
a detailed demonstration of the algorithm in the interests of 
space, but it is easy to see how the algorithm produces the 
rest of the SJ-Tree shown. 

The time complexity of the scoring algorithm is 0{dq), 
where dq is the average degree of a vertex in the query. We 
assume that the degree of data graph vertex labels and average 
degree of the data graph vertex types are precomputed. The 
algorithm takes 0{n) time to find the maximum score and 0- 
degree vertex removal in the worst case, where n is the order 
of the query graph. The loop body execution time is dominated 
by the search for the maximum scoring vertex which has an 
vertex-disjoint neighborhood with the graph in its sibling node 
in the SJ-Tree and is O(n^). Thus, the total time taken by the 



loop is 0{m'n?) where m is the size of the query graph. 

Currently the SJ-Tree is created only once per query graph 
based on the initial batch of edges. For adaptive stream based 
processing, the SJ-Tree can be created periodically. If the 
underlying data distribution changes significantly to force a 
different query plan then we will need to use the newly 
generated tree and "migrate" existing matches to the new 
structure. Investigating the details are part of our future work. 

VI. Continuous Query Algorithm 

We present a subgraph search algorithm (Algo.[3]and|4]) that 
utilizes the SJ-Tree structure (referred to as T). The search 
process is illustrated in Fig. [5] Fig. shows the query graph 
and Fig.|5}) shows a snapshot of the data graph where a match 
appears over time. 

Algorithm 3 PROCESS-CONT-QUERY(Gd, T, edges) 
1: leaf -nodes =GET-LEAF-NODES (T) 
2: for all es G edges do 
3: UPDATE-GRAPH(eJ 
4: for all n G leaf -nodes do 

dluh =GET-QUERY-SUBGRAPH(T, n) 
6: matches =LOCAL-SEARCH(Gd, gl^^) 

7: if matches ^ then 

8: for all m G matches do 

9: T.UPDATE-SJ-TREE(n, m) 



A. Local Search 

Our proposed subgraph matching algorithm contains two 
primary tasks. First, for every incoming edge we perform a 
local search to detect a match with the smallest subgraphs 
associated with the leaves of the SJ-Tree. Given a match, we 
check to see if it can be combined with any of the existing 
matches maintained in the SJ-Tree to produce a larger match. 
When a match is found with the subgraph corresponding to 
the leaf node of the SJ-Tree, we initialize a match structure 
and insert it into the collection maintained at that leaf node. 
This process is described in Algorithm [3] 

We implement an exact search algorithm for performing the 
local search. It runs a subgraph isomorphism check around the 
neighborhood of the incoming edge. The degree information 
of vertices, the type of vertices and edges and labels when 
available, is used to filter candidates in the search process. 
The query decomposition often reduces the local search to 
performing star queries where the center of the query is the 
vertex representing a temporal event. The peripheral vertices 
of the star query are the other entities that represent various 
attributes of the event. For most of the events seen from a wide 
range of data analysis, the vertices representing a temporal 
event have relatively small degree. Therefore, the search is 
typically fast. Further, in the context of real-time search, if 
the current time is t and the query specifies a time window 
of length tw (see section [VII-B| for details on temporal query 



B. Partial Match Aggregation 

Remember that the SJ-Tree is a binary tree. Therefore, upon 
insertion of a match into a leaf node we check to see if it can be 
combined with any matches that are contained in the collection 
maintained at its sibling node. The process of combining 
matches is described in the next subsection. Due to the way 
the SJ-Tree is constructed (Prop. 2, Sec. IV-C| ) , a combined 
subgraph will be a match for the parent node in the SJ-Tree. 
Thus a successful combination of matching subgraphs between 
the leaf or intermediate node and its sibling node leads to the 
insertion of a larger match at the respective parent node. This 
process is repeated as long as larger matching subgraphs can 
be produced by moving up in the SJ-Tree. A complete match 
is found when two matches belonging to the children of the 
root node are combined successfully. This process is described 
in Algorithm]?] 

For implementation purposes, each node in the SJ-Tree 
stores the ids of the sibling node and the parent node. The 
query subgraphs corresponding to the leaf nodes in the SJ- 
Tree are isomorphic for query graphs shown in Fig. [T] and 
2. For such cases, only the bottom left-most leaf node in the 
SJ-Tree stores the smallest matches. This avoids redundancy 
and ensures that a matching subgraph isn't compared with 
itself repeatedly. When the smallest partial match is initiated 
we first check to see if it can be joined with existing partial 
matches in the tree prior to inserting in the tree. This is 
done by checking with the largest partial match first and then 
following the descending order of partial match size. In terms 
of the left-deep SJ-Tree structure, the checks begin with the 
partial matches stored at the left-child of the root and then 
progressively moving down in the tree (Fig. |4]). 

Algorithm 4 UPDATE-SJ-TREE(node, m) 



sibling = sibling[node] 
parent — parent[node] 

k =GET-JOIN-KEY(CUT-SUBGRAPH[parenq, m) 
Hs = ll[sibling] 

= GET(Hs,k) 
for all ms G do 
msup = JOIN(msm) 
if parent = root then 

PRINTCMATCH FOUND : ', m^up) 
else 

UPDATE-SJ-TREE(parent, msup) 

ADDimatches [node] , m) 
ADD(H[node],k,m) 



C. Partial Match Join 

This subsection describes the process for joining matches 



optimizations) then all edges that have a timestamp older than 
{t — tw) are ignored from the search. 



across two sibling nodes in the SJ-Tree (Prop. 2, Sec. |IV-C| ). 
Each node in the SJ-Tree maintains a hash table that supports 
storing key- value pairs. The table is required to store multiple 
values for every key, i.e., a multimap. Whenever a new 
matching subgraph g is added to a node v in the SJ-Tree, 
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Fig. 5. a) Example query b) Occurrence of a match in the search graph c) Combining two partial matches to form the complete match 



we compute a key value using its projection (n(^)) (Prop. 5, 
Sec. IV-C| ) and insert the key and matching subgraph into the 
hash table. The following paragraph describes the steps for 
computing the key. 

The projection is obtained by extracting 
matching vertices and edges with respect to the 
subgraph CUT-SUBGRAPH(par ent ( ) : U ( , CUT- 
SUBGRAPH (parent(v))). The projection of a subgraph 
is represented as an ordered collection of vertices and 
edges. A string representation of this collection is hashed 
to obtain the key for the projected subgraph. Next, the key 
and the matching subgraphs are inserted into the multimap 
as key- value pairs. These operations are indicated by GET- 
JOIN- KEY in Algorithm |4] As is evident, the sibling and 
parent arrays (line 1, 2) store the ids of the sibling and parent 
for every node. An array of multimaps are maintained for all 
nodes in the SJ-Tree and H [sibling] or Hs is the multimap 
for the sibling node. The efficiency of the join operation 
between two subgraphs is critical as it is the most frequently 
invoked operation. It implements an array of constraint checks 
to ensure that the resultant subgraph match is valid. As a 
consequence of such checks, most of the calls to the join 
operation do not produce a larger subgraph. 

D. Complexity Analysis 

There are two primary tasks in processing every edge in 
the continuous query algorithm, (1) performing a local search 
for a small subgraph of the query graph and in case of a 
successful search, (2) updating the SJ-Tree with the partial 
match. Specifically, the local search involves retrieving the 
neighborhood of the endpoint vertices of an incoming edge 
and then searching for a query subgraph. We assume that the 
neighborhood retrieval is an 0(1) operation. For the multi- 
relational queries described in this paper, the local search 
reduces to performing a star query. Fig. 5 (above) shows an 
example star query that searches for a post with a location 
and certain keyword. The typical degree of such star queries 
is small and thus, the local search is cheap. Therefore, we can 
approximate the cost of the continuous query processing for 
every edge to a small constant in case of a failed local search 
and to that of the UPDATE- SJ-TREE() for a successful search. 
UPDATE- SJ-TREE() requires joins between partial matches 
at h levels, where h is the heigh of the SJ-Tree. We insert a 
partial match m/c at node k in the SJ-Tree and then lookup 



partial matches in the sibling node that have the same key. 
Partial matches that result from a successful join between 
rrik and the set of candidates are added to the parent node. 
Assuming denotes the cost of this operation at node k, 
the time complexity for updating the tree is 0{{Mk)^), where 
M/c denotes the expected value of over all nodes. 

This requires potentially creating Uk subgraphs of size 
\E{VsG{p(i'^ent{k)})\. Hence, can be modeled as 

0{nk\E{VsG{pcii^ent{k)})\). If gs is the query subgraph 
corresponding to the sibling node, then n/c is determined by the 
number of subgraphs in the data graph that satisfies the label 
constraints from the query graph and are isomorphic to gs. 
Accurate estimation of the frequency of an arbitrary subgraph 
is hard. Therefore, we resort to obtaining a loose bound in 
terms of the label constraints. Assume that Vq has the lowest 
degree among all labeled vertices in the query graph and Vs is 
the corresponding vertex in the data graph. Then n/e is bounded 
by (^^), where ds and dq are the degrees of Vs and Vq. In the 
following section, we show how to control the complexity of 
the algorithm by adding stricter constraints that reduces n^. In 
the experimental analysis section, we verify the performance 
of our algorithm by selecting vertices with progressively 
higher degree. The space complexity of the SJ-Tree can be 
derived from the total memory requirement for storing partial 
matches in the leaves and internal nodes of the SJ-Tree. Given 
that there are 2/i + 1 nodes in the SJ-Tree, the total storage 

requirement is 5]^^+^ T.jekeys{H{i}} 4' ^^^^^ ^^^^^ ^^^P 
represents all nodes in the SJ-Tree, the inner loop iterates over 
the keys in the hash table at every node and is the cost of 
storing each partial match. On an average, the size of the inner 
loop is proportional to M^, and is 0{\E{Gq)\), hence the 
storage complexity is 0{hMk\E{Gq)\). 

VII. Temporal Query Optimizations 

While the SJ-Tree based approach provides drastic improve- 
ment over the naive approach, it does not escape the combina- 
torial problems associated with subgraph isomorphism. In this 
section, we introduce two optimizations that take advantage 
of continuous query features. 

A. Temporal Ordering of Partial Matches 

Suppose we have a query that attempts to find a sequence 
of two events with a common set of attributes. Assume that 
two matching events ei and 62 are found with timestamps ri 



and r2 respectively, with ri < r2. For all practical purposes 
we should report the sequence {ei, 62} and ignore the out of 
order combination. The same idea can be applied to subgraph 
matching to produces matches with a canonical ordering of 
events. Assume we have two partial matches Mi and M2 with 
edge sets {e^, ej} and {em^en} respectively. For such an input, 
the join algorithm is made to reject all combinations of these 
two sets that do not represent a monotonic order based on 
timestamps. This is accomplished by computing a range of 
timestamps for each partial match. If tiow[Mi] and thigh[Mi] 
are the lowest and highest timestamp for match M^, then we 
require that thigh[Mi] < tiow[M2] for joining Mi and M2. 

B. Temporal Window based Pruning 

As observed earlier, the expected number of partial matches 
stored in the leaves of the SJ-Tree has a significant impact 
on the query processing performance. Given our focus on 
real-time event discovery, we do not need to store partial 
matches that are older than the time horizon associated with 
the application. For example, our window of interest spans 2- 
3 days for events from a news stream, few hours for social 
media or finance, and minutes in cyber security applications. A 
query parameter tw is introduced to incorporate this intuition, 
which specifies the length of the time window for maintaining 
partial matches in memory. The query processor periodically 
prunes the SJ-Tree to remove matches that are older than tw 
from the current time. 

VIII. Experimental Results 

We seek to answer the following questions from the exper- 
imental analysis. 

1) How does our continuous graph query algorithm com- 
pare with the state of the art? To answer this, we com- 
pare our algorithm's performance with the IncIsoMatch 
algorithm presented in |5|. 

2) How does our query algorithm perform on real-world 
datasets? How can we stress test our query imple- 
mentation? We provide the answers from exhaustive 
experimentation on three real-world datasets through 
systematic query selection. 



TABLE I 

Summary of k-Partite characteristics of test datasets 



Graph dataset 


vertices 


edges 


vertex types 


edge types 


New York Times 


39,523 


68,682 


4 


4 


DBLP 


3.158M 


3.26M 


2 


1 


Tencent Weibo 


2.5M 


89.6M 


4 


5 



We carried out experimental studies on three different data 
sources 1) a collection of online news articles spanning over 
three months, 2) the DBLP co-authorship network and 3) a 
social media dataset obtained from Tencent Weibo, a Chinese 
social network. 



keyword="QUERY_LABEL" 
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Fig. 6. The query template used to find temporal events. Experiments are 
performed using queries with 4 event vertices and 2 feature vertices. One 
feature vertex is labeled and all other vertices specify only types. 

A. Testing Methodology 

Our metric is the time to process increments of a fixed 
number of edges (Ik or 100k) whichever is closer to 1% of 
the test dataset size. The times reported only include the time 
spent in the query processing section. No temporal pruning 
is applied to partial matches unless explicitly specified. We 
use the query template as shown in Fig. [6] To develop a 
performance model in terms of the label distribution, we 
sample the degree distribution of every vertex type and divide 
the range of the degree distributions into ten intervals. For 
each interval, five closest candidate vertices are selected for 
testing purposes. Selection of multiple vertices around each bin 
allows us to systematically observe the impact of increasing 
the degree of the labeled vertex in the query graph. We report 
only a subset of the entire testing results in the interest of 
space. 

B. Experimental Setup 

All the results were obtained by using a single core on a 
48-core shared memory system comprising 2.3 GHz AMD 
Opteron 6176 SE processors and 256 GB memory. The 
processor cache size is 512KB and each system node has 32 
GB memory attached to it. The code was compiled with g++ 
4.1.2-52 with -03 optimization flag on Linux 2.6.18-308. 

C. New York Times 

We use a news dataset from New York Times collected 
over Aug-Oct 2011 Each article in the dataset contains a 
number of facets that belong to four type of entities. Each 
of the articles and facets are represented as vertices in the 
graph. Each edge that connects an article with a facet carries a 
timestamp that is the publication time of the article. Following 
the template shown in Fig. |6] we run a query that finds four 
articles where all the articles have a common keyword and 
location. For the location vertex we specify the labels shown 
in the top diagram in Fig. [7] and observe the performance. The 
X-axis shows the growth of the graph in terms of number of 
edges added and the y-axis shows the time required to process 
1000 edges. The lower diagram in Fig. |7] shows results from 
a similar query except that it specifies a label on the keyword 
vertex instead of the location vertex. As the figures indicate, 
selecting labels that correspond to vertices with increasing 
degrees increases the running time of the query. 

%ttp://data.nytimes.com 
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Fig. 7. Results from queries finding four articles with a common keyword and 
location. The query graph contains one labeled location(top)/keyword(bottom) 
vertex. The labels are chosen with increasing degree for the experiment. The 
degrees are indicated in the legend. 



All the InclsoMatch measurements are clustered here due to log scale display 
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Fig. 8. Comparison with InclsoMatch. MQD refers to the Multi-Relational 
Query Decomposition algorithm from this paper. 



Fig. 9. Query to find two authors co-authoring four papers. 
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Fig. 10. Performance results for a query to find authors who co-authored 
four papers with a given author. The spikes in the plot can be attributed to 
the bursty nature of scientific publishing where authors target the same group 
of conferences and journals every year. 



Next, we compare our approach with the InclsoMatch 
algorithm described by Fan et al. |5 1. The VF2 algorithm |T4| 
is adapted to implement the graph search functionality as out- 
lined in InclsoMatch. Our graph search implementation goes 
beyond the constraint checking algorithms as outlined in VF2 
and implements additional filtering and verification techniques. 
Three queries used for comparison tests are described below. 
We specify a label on the feature marked with f and select a 
label with one of the highest degrees for that vertex type. The 
queries are as follows: 1) Find four articles with a common 
keyword and a common organization!, 2) Find four articles 
with a common entity and a common keyword f and 3) Find 
four articles with a common entity and a common location f . 

Fig. [8] shows a performance improvement from our algo- 
rithm by several orders of magnitude. The multiple orders 
of improvement in performance is attributed to the strictly 
ordered aggregation of partial matches in the SJ-Tree and the 
temporal property based optimizations. The performance gap 
between the processing time of the two algorithms increases 
as the graph grows larger. We attribute this to the nature of 
the InclsoMatch where it performs a search around every new 
edge in the graph. The search spans all vertices around the 
endpoints of the new edge as long as they are within /c-hops, 
where k is the diameter of the query graph. As the data graph 
grows denser, even for a query graph with small or modest 
size, the /c-hop subgraph accumulates a large number of edges 
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Fig. 11. Query to detect item recommendation patterns from a group of 
users. 
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Fig. 12. Query processing time for the Tencent Weibo dataset for queries 
with varying selectivity. 



and the search becomes increasingly expensive. 

D. DBLP Co-Authorship Network 

Next, we build a multi-relational graph representation of the 
DBLP citation network with two types of entities: authors 
and articles. The author name and the title of the article are 
stored as labels of respective vertices. We run a query that 
attempts to find an author (author 1) who has co-authored 
four papers with a specified author (author 2) (Fig. |9]). We 
observe the degree distribution of the "author" vertices and 
select names with progressively increasing degrees. The results 



are shown in Fig. 10 The x-axis shows the growth of the graph 
in terms of number of edges added and the y-axis shows the 
time required to process lOOK edges. It can be seen that the 
performance of the algorithm is quite stable for a modestly 
large network with nearly 3M+ edges. 

E. Social Media 

Finally, we present our results on a data set collected from 
Tencent Weibo, a Chinese microblogging social networl|^ The 
data set provides a temporal history of item recommendations 
to registered users of the social network. An item could be any 
entity such as an organization, person or a topic that may be 
recommended to a user. The response from the user is either 

^ dblp.uni-trier.de/xml 

2 www.kddcup201 2.org/c/kddcup20 1 2-trackl 
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Fig. 13. Query processing time for the Tencent Weibo dataset with temporal 
match pruning applied on every 5 million edges. 



an accept or reject. The dataset provides a set of keywords de- 
scribing each item and a set of keywords describing each user's 
profile, if available. We build a graph with four type of vertices 
- users, items, keywords and categories. The following edge 
types indicate relationship between entities: a) item belonging 
to a category, b) item described by keyword, c) acceptance of 
recommended item, d) rejection of recommended item and e) 
user described by keyword. 

Our test query is to detect a series of item acceptances by a 
group of users described by a common keyword. To put this 
in perspective, one can imagine when an advertiser introduces 
a new campaign or product and wants to monitor its feedback 
from the user base in real-time. Our test query graph has four 
user vertices, one item vertex and one keyword vertex (Fig. 
\n\ . For experimentation, we specify a label on the item and 
select labeled vertices with progressively increasing degrees. 
The query labels chosen for the social media dataset are in the 
hundreds of thousands compared to hundreds in online news 



or citation network testing. The results are shown in Fig. 12 
The X-axis shows the size of the graph as measured in millions 
of edges. The y-axis shows the time in seconds required to 
process every 100k edges. 

The figure suggests a clear trend. It shows that as the graph 
grows large the query processing time eventually rises sharply. 
This amplification is enabled by the scale of the dataset. It 
also shows that the rise happens earlier for low- selectivity 
queries where the specified label has higher degree in the 
graph. This is because the number of partial matches grows 
rapidly in the event of a successful search around a high degree 
vertex. Every partial match from the past can potentially be 
merged with the latest partial match, and the partial match 
collection grows combinatorially over time. This brings us to 
implementing the temporal window based pruning as outlined 
in section VII-B We select the query with the highest degree 

for which the rise in 



label (degree(item) = 299199, Fig. [12]) 
the processing time was sharpest. We set the time window 
tw to 1 day and prune the SJ-Tree after processing every 5 



million edges. The results from the windowing enabled search 



is shown in Fig. 13 Observe that the peaks in the processing 
time are smaller than ones observed in Fig. [12] by an order of 
magnitude. 

This is an extremely promising result for practical appli- 
cations. Figure [13] suggests that it would take 10 seconds on 
average to process 100k edges, or 100 seconds for a million 
edges. In terms of throughput, this translates into 0.01 million 
edges/second or 864 million edges per day. At the time of 
this writing, high volume data streams such as Twitter receive 
nearly 300-400 million posts every day. Considering that every 
post or user action translates into multiple edges in a graph, 
one may expect around billions of edges to arrive everyday. 
Also note that we chose an extreme example for the last ex- 
periment. For a query with modest complexity the throughput 
will be much higher for a temporal window enabled query 
processor. Thus, we believe this level of throughput on a very 
low- selectivity query gets us close to executing real-time graph 
queries on such high volume data streams. 

IX. Conclusion and Future Work 

We present a novel query graph decomposition based ap- 
proach for continuous queries on multi relational graphs. We 
introduce the SJ-Tree structure, whose nodes represent the 
hierarchical decomposition of the query graph. The SJ-Tree 
systematically tracks the evolving matches in the data graph 
as they transition from smaller to larger matches based on the 
query graph decomposition. We present experimental analysis 
on several real- world datasets such as New York Times, DBLP 
and Tencent Weibo and show that our SJ-Tree based algorithm 
coupled with temporal optimizations that clearly outperforms 
the state of the art (5] by multiple orders of magnitude. 

Our experiments demonstrate that it is possible to execute 
complex multi-relational graph queries in a real-time setting. 
To our knowledge, the results presented in this paper are 
the best reported performance for such queries. Our main 
theoretical contribution is to demonstrate that for a prominent 
class of multi-relational queries where the local search is 
cheap, we can execute graph queries in time that is polynomial 
to the height of the SJ-Tree. We also present an efficient 
algorithm to generate a SJ-Tree for any query graph by 
exploiting its structural and semantic characteristics. These 
initial results are highly promising in that they suggest possible 
ways of auto- selecting optimal values for query processing 
parameters based on the data distribution. Query planning for 
1) complex graph queries where a complete temporal ordering 
may not be possible, 2) trade-offs between different query 
decomposition strategies and 3) exploring different query 
classes and determining the optimal trade-off between local 
search and joins in the SJ-Tree represent areas of future work. 
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